In-set/out-of-set speaker identification based on discriminative speech frame selection
نویسندگان
چکیده
In this paper, we propose a novel discriminative speech frame selection (DSFS) scheme for the problem of in-set/out-of-set speaker identification, which seeks to decrease the similarity between speaker models and background model (or antispeaker model), and increase the accuracy of speaker identification. The working scheme of DSFS consists of two steps: speech frame analysis and discriminative frame selection. Two methods are used to perform DSFS, (i) Teager Energy Operator (TEO) energy based and (ii) MELP pitch based methods. An evaluation using both clean and noisy corpora that include single and multiple recording sessions show that both TEO energy based and MELP pitch based DSFS schemes can reduce EER (equal error rate) dramatically over a traditional GMM-UBM baseline system. Compared with traditional GMM speaker identification, the DSFS is able to select only discriminative speech frames, and therefore consider only discriminative features. This selection is able to decrease the overlap between speaker models and background model, and improve the performance of in-set/out-of-set speaker identification.
منابع مشابه
Dimension reduction for speaker identification based on mutual information
Dimension reduction is a necessary step for speech feature extraction in a speaker identification system. Discrete Cosine Transform (DCT) or Principal Component Analysis (PCA) is widely used for dimension reduction. By choosing basis vectors from basis vector pool of DCT or PCA which contribute more to data distribution variance or reconstruction accuracy of speech data set, we can transform th...
متن کاملImproving native accent identification using deep neural networks
In this paper, we utilize deep neural networks(DNNs) to automatically identify native accents in English and Mandarin when no text, speaker or gender information is available for the speech data. Compared to the Gaussian mixture model(GMM) based conventional methods, the proposed method benefits from two main advantages: first, DNNs are discriminative models which can provide better discriminat...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملSpeaker Identification and Verification Using Support Vector Machines and Sparse Kernel Logistic Regression
In this paper we investigate two discriminative classification approaches for frame-based speaker identification and verification, namely Support Vector Machine (SVM) and Sparse Kernel Logistic Regression (SKLR). SVMs have already shown good results in regression and classification in several fields of pattern recognition as well as in continuous speech recognition. While the non-probabilistic ...
متن کاملRobust Identification of Smart Foam Using Set Mem-bership Estimation in A Model Error Modeling Frame-work
The aim of this paper is robust identification of smart foam, as an electroacoustic transducer, considering unmodeled dynamics due to nonlinearities in behaviour at low frequencies and measurement noise at high frequencies as existent uncertainties. Set membership estimation combined with model error modelling technique is used where the approach is based on worst case scenario with unknown but...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005